Meta-analyzing intelligence and religiosity associations: Evidence from the multiverse

Over the past century, a remarkable body of research about the relationship of intelligence and religiosity has accumulated. So far, the majority of studies that investigated this relationship showed a negative correlation, indicating lower cognitive abilities of individuals reporting stronger religious beliefs. Although the effect direction has been observed to be largely consistent across studies, the reported effect strength varied substantially across studies. Several potentially moderating variables such as different intelligence and religiosity assessment methods, educational status of samples, and participant sex have been proposed as likely candidates for explaining systematic differences in effect strengths. However, the effects of these moderators are to date unclear. Consequently, we focused in investigating effects of these moderating variables on the intelligence and religiosity link in an update of prior meta-analytical investigations in n = 89 (k = 105; N = 201,457) studies. Random-effects analyses showed a small but robust negative association between intelligence and religiosity r = -.14 (p < .001; 95% CI [-.17, -.12]). Effects were stronger for (i) psychometric intelligence tests than for proxy measures such as grade point averages and (ii) general population and college samples than pre-college samples. Moreover, we provide evidence from combinatorial, multiverse, and specification curve analyses that further corroborates the robustness of the investigated association. Out of 192 reasonable specifications all 135 (70.4%) significant summary effects were negative. In all, our results show small but robust negative associations between religiosity and intelligence that are differentiated in strength but generalize in terms of direction over moderating variables.


Introduction
Intelligence and religiosity associations have been investigated for more than 90 years by now. As early as in 1928, two studies [1,2] reported negative correlations between cognitive abilities and religiosity, indicating lower religiosity of more intelligent individuals. Subsequently, the association of intelligence and religiosity has been examined in several nations, by means of different intelligence measures, and with different methods to assess religiosity. On a national level, negative associations between intelligence and religiosity have been well-established.
Associations of national IQs with the respective countries' atheism rates indicated higher test performance in countries with larger self-reported atheistic population percentages [3]. However, the investigation of potentially moderating variables within countries has so far led to less unequivocal results. For instance, these associations have been shown to be differentiated according to quality of life, yielding meaningful positive associations in countries with higher [4] but mostly nill-effects in countries with lower quality of life [5]. Moreover, within countries, the findings appear to be less consistent in terms of strength and in certain cases even of the sign, yielding seemingly erratic patterns of results possibly depending on subjectlevel variables such as participant sex, age, or highest educational qualification [6].
The first formal quantitative synthesis of the intelligence-religiosity association [6] found stronger negative effects in women than in men. A reanalysis of these very same data [5] suggested that the negative intelligence-religiosity link is limited to female samples only. However, in a more recent meta-analysis, a robust association that generalized over sex was identified [7]. Other inconsistencies between these meta-analytical accounts emerged as well, indicating either mediating effects of education [5] vs. no mediation [7] or partial [5] vs. full mediation [7] of intelligence on the education-religiosity link.
Effects of education on the intelligence and religiosity link are conceptually plausible because on the one hand, education is well-known to be robustly associated with intelligence [8] and on the other hand, highest educational attainment has been found to correlate negatively with formal church attendance [9].
Two potential causes may be assumed that make differences in the strength of the intelligence and religiosity association plausible. On the one hand, weaker associations in college samples than in general population samples may be attributed to range restriction. Samples comprising college students can be seen as more homogeneous than the general population in terms of cognitive ability, education, religiosity and other variables. Therefore, effects of college samples may be expected to be lower than those of less homogenous general population samples.
On the other hand, weaker associations of college and pre-college samples than general population samples may be seen as a function of participant ages. Religiosity has been shown to increase with age, although the increase appears to be non-linear because the most substantial changes take place in the early adulthood, but seems to further increase in old age [10]. However, so far, the majority of published studies examined intelligence and religiosity in samples with comparatively young participants (for some exceptions, see [11][12][13]). Nonetheless, the available meta-analyses [5][6][7] indeed reported weakest effects for pre-college samples, while (stronger) effects of college and non-college samples may not be as differentiated.
Besides such varying subject-level characteristics, primary studies examining the association of intelligence and religiosity also differ in their methods to assess both concepts. Many studies investigated correlations of psychometric intelligence test results with self-reported religiosity [e.g., 11]. However, in some studies, intelligence has only been measured by a proxy such as academic achievement (e.g., by means of Grade Point Averages; GPA) [e.g., 14], thus introducing more statistical noise in the cognitive ability assessment. These differences have been shown to impact the effect strength of the relation to religiosity [6], because GPA-based assessments represent more crude indicators of cognitive abilities than psychometric intelligence tests do. Therefore, effect sizes can be expected to systematically differ in regard to intelligence assessment types and consequently represent another meaningful moderating variable of the intelligence and religiosity link which needs to be taken into account when synthesizing effects.
Primary studies also differed in terms of their religiosity assessments. While some authors assessed religious beliefs directly by means of self-report questionnaires or single items [e.g. 15], others asked their probands about their engaging in certain religious behaviors such as participation in religious campus organizations [16], going to church [e.g. 17], or the amount of contact with and dependence on the church community [e.g . 18]. Assessing religious beliefs as compared to religious behavior has been shown to moderate the association with intelligence [6]. This seems plausible because religious behaviors are often motivated by non-religious reasons. The desire for social involvement, belonging to or acceptance in a community may lead to engaging in such behaviors and therefore constitute a less accurate estimate of religiosity [19]. Consequently, intelligence correlations with religious beliefs may expected to be stronger than those with religious behaviors.
Finally, meta-analytical effect estimations necessarily depend on several decisions that are made by the authors of a given meta-analyses, such as the definition of inclusion criteria, subgrouping, or the use of different analytical approaches. In regard to the present research question, this is illustrated by the differing results of prior meta-analyses that used identical data but arrived at different conclusions [5,6] because of their different conceptual and analytical choices. Consequently, there may be many different reasonable ways to synthesize a certain set of data that pertain to a given research question. It has been demonstrated that even seemingly arbitrary decisions in data treatment and analysis can have substantial influences on the outcomes [20,21]. Multiverse and specification curve analyses represent useful tools that allow the assessment of a large range of different reasonable meta-analytical summary effects [22].

The present study
Here, we aim to investigate the strength of the intelligence and religiosity association by (i) assessing influences of specific conceptually plausible moderating and mediating variables and (ii) examining a large number of different reasonable approaches to synthesize this effect using multiverse and specification curve analyses. Moreover, we use several standard and novel methods to investigate potential dissemination bias in these data and examine evidence for cross-temporally declining effect sizes.
The study protocol including all hypotheses, sampling, and planned confirmatory analyses have been preregistered at https://osf.io/5r4qu prior to all data analyses (all deviations from the preregistered protocol are documented on the project page at https://osf.io/ka2ym/).

Literature search
Potentially relevant articles were searched in five databases that index published journal articles and books (Google Scholar, ISI Web of Knowledge, PsycINFO, Pubmed, Scopus) as well as the Open Access Dissertation and Theses (oadt.org) which allows identification of relevant items from the grey literature. We used the following search string to identify titles and abstracts of potentially includable articles within the databases: (IQ OR intelligence OR "cognitive ability") AND (religious OR religiosity OR "religious beliefs" OR spirituality). The title and abstracts of 1,862 potentially relevant articles were screened for relevance (for a flowchart, see Fig 1). Subsequently, full-texts of potentially relevant articles were obtained. The search process was originally carried out in July 2018 and was updated in September 2018. Another update was conducted in April 2021, applying the same search strategy.

Inclusion criteria
Studies had to meet four criteria in order to be eligible for inclusion in the present meta-analysis. First, correlations between religiosity or spirituality with scores from psychometric intelligence tests or academic achievement measures had to be reported. Second, samples had to comprise healthy participants. Third, reported data had to be independent of data of other included studies. In cases of data dependencies, inclusion of (i) larger and (ii) more recently published samples was preferred. Finally, primary studies had to be published in either English or German. A list of included studies is available in the Online Supplement (S1 Appendix).

Coding
Coding of studies into categories (type of religiosity measure: beliefs vs. behavior vs. mixed; type of intelligence measure: IQ vs. GPA vs. mixed; publication status: published vs. unpublished; sample type: pre-college vs. college vs. non-college) and recording of other relevant variables (publication year, effect sizes, their associated p-values) as well as sample characteristics (sample size, percentage of men within samples) was conducted twice by the same experienced researcher [FD]. Inconsistencies in coding were resolved by discussion with an independent researcher [JP]. Inclusion of correlations of religiosity with psychometric intelligence tests was preferred over those with less salient assessments of intelligence (e.g., GPA; if more than one correlation of a distinct measure was reported, correlations were averaged using ztransformations).
Primary study quality assessment. We assessed the quality of included studies by means of an adapted version of the Newcastle-Ottawa Quality Assessment Scale [23] for cross-sectional studies. Records were evaluated based on the representativeness of the sample, the sample size, handling of non-respondents as well as the extent of the response rate and ascertainment of the exposure, thus yielding an index of study quality ranging from 0 to 5 points. The adapted scale along with quality ratings for individual studies are detailed in the online supplementary materials (S2 Appendix).

Data analysis
Effect sizes were synthesized by means of precision-weighted random-effects models. As a descriptive measure of heterogeneity, the index I 2 was computed. This measure provides the percentage of the total variability which can be attributed to true variation between studies rather than sampling error. We interpreted I 2 -values ranging from 0 to 25% as indicative of trivial, 25-50% as small, 50-75% as moderate, and 75-100% as large heterogeneity in accordance with well-established guidelines [24].
Subgroup analyses. Potential influences of moderator variables were assessed using mixed-effects models (i.e., effect size estimates of each subgroup were based on random-effects models but between-subgroup comparisons were based on fixed-effect models). In a series of subgroup analyses, effect sizes were grouped according to the religiosity assessment type (beliefs vs. behavior vs. mixed), sample type (pre-college vs. college vs. non-college), and status of publication (published vs. unpublished). Moreover, we investigated potential influences of intelligence assessment type (categorized into: IQ vs. GPA vs. mixed assessment) in an exploratory (i.e., non-preregistered) analysis.
Meta-regression. Continuous moderators were examined by means of linear precisionweighted meta-regressions. Because recent evidence suggests overproportional numbers of declining effect strengths over time in empirical research regardless of the investigated research question [35], we examined potential influences of publication years on effect sizes. Therefore, we calculated two weighted meta-regressions for effect sizes on publication year based on all and on published effect sizes only. This approach was deemed appropriate because conceptually, decline effects are likely to be masked by results by the included unpublished effect sizes (see [35]). Moreover, effect sizes were regressed on percentage of men within samples for all studies for which sex breakups were available (k = 76). Finally, to test for a possible influence of primary study quality on reported estimates, meta-regressions of effect sizes on study quality were conducted.
Dissemination bias. We used several standard and more modern methods to assess potentially confounding effects of dissemination bias. Only published studies were included for dissemination bias analyses. First, funnel plots were visually inspected for indications of asymmetry. Second, Begg and Mazumdar's rank correlational method [36] was used to formally examine possible funnel plot asymmetry. By means of this method, primary study effect sizes and their precision are ordinally correlated and interpreted as indicative of publication bias if associations become significant (here, an alpha level of .10 is assumed). Third, Sterne and Egger's regression test [37] was employed. In this approach, standard normal deviates of effect sizes (the quotient of effect sizes and their standard errors) are regressed on study precision. In absence of publication bias, the intercept of the regression line should not significantly deviate from the origin.
Fourth, we used the Trim-and-Fill method [38] to estimate the number of missing studies on the asymmetric side of the funnel plot. Moreover, this approach provides an adjusted effect estimate which can be interpreted as a sensitivity analysis (i.e., large numerical deviations of standard and adjusted effect estimates indicate publication bias). Fifth, we tested for a potential excess of significant studies [39]. In this approach, the average power of the primary studies to detect the observed meta-analytic summary effect is calculated. Based on these power estimates, the expected frequencies of significant studies are calculated and compared with the number of observed significant studies (of note, signs of the significant primary study effect sizes need to be consistent with the summary effect) by means of a chi-squared test (i.e., significantly larger numbers of observed than expected studies being indicative of bias).
Finally, we used three novel methods for dissemination bias assessment (p-curve, p-uniform, p-uniform � ) that are based on similar ideas, but rely on different implementations. These methods allow the estimation of meta-analytical summary effects that are based on the p-value distributions and their associated degrees of freedom of significant published primary studies only. To this end, it is argued that selective reporting of studies is driven by statistical significance rather than the strength of effect sizes (as for instance assumed in the Trim-and-Fill procedure) [40]. Because significant studies should not be vulnerable to strategic submission behaviors such as file-drawering (i.e., all significant studies have the same probability of being published), effect estimations that are only based on significant p-values can be expected to remain unaffected by dissemination bias.
Specifically, p-curve [41] relies on the observed distribution of independent significant pvalues of a set of studies. Right skewed curves are indicative of evidential value of the observed set of studies (i.e., the meta-analytic summary effect differs significantly from zero), left skewed ones indicate p-hacking (i.e., a certain variety of questionable research practices; see [22]), and uniform distributions are indications of lacking evidential value (i.e., the accumulated evidence is insufficient to provide meaningful information about either the null or the alternative hypothesis). The p-curve distribution can be used to estimate summary effect sizes by recording the effect size that is associated with the theoretical density distribution of conditional pvalues that fits best to the observed empirical p-value distribution.
Another method, which is based on the same underlying logic but differs in its implementation is p-uniform. In this approach, dissemination bias is assessed by examining potential leftskew of the conditional p-value distribution for a fixed-effect model [42]. For the summary effect estimation, the conditional p-values of the observed p-value distribution are uniformized and their associated effect estimate is assumed to represent the meta-analytical summary effect.
However, a drawback of both p-curve and p-uniform is their vulnerability for large between-studies heterogeneity, therefore limiting their applicability in many cases. Consequently, we additionally used p-uniform � [43] which is a recently developed extension of puniform that improves the effect estimation when considerable between-studies heterogeneity is observed.
In contrast to p-uniform, effect estimation in p-uniform � is based on the p-values of significant as well as non-significant studies and improves the previously established p-value-based estimation methods three ways, namely: (i) because of larger numbers of includable effect sizes, estimates become more accurate (i.e., they show smaller variance), (ii) overestimation of summary effects due to large between-studies heterogeneity is reduced, and (iii) the method allows a formal assessment of between-studies heterogeneity. Effect estimation in this method broadly resembles selection model approaches but does not require estimations of weight functions. Moreover, it is based on the assumption that significant studies on the one hand and non-significant studies on the other hand possess the same publication probability, although these probabilities may differ between these two study types [for details, see 44]. Of note, p-uniform does not provide a formal test for publication bias, but supposedly corrects for potential bias in its effect estimations. The R code for all our analyses is available in the Online Supplement (S3 Appendix).
Combinatorial meta-analysis. Another meta-analytic idea similar to the multiverse-analysis or the specification-curve approach is combinatorial meta-analysis [45]. This method aims at calculating effect estimates for all 2 k -1 possible subsets of available data. It can be interpreted akin to sensitivity analyses, which are designed to identify outlier studies that overproportionally affect the summary effect estimates. On the one hand, this brute-force method is suited for identifying influential studies. On the other hand, it, blindly tests all possible study subsets, although most of them would not be considered as representing reasonable selections.
A maximum number of 105 available samples yields over 40 trillion unique subsets (2105-1 = 40,564,819,207,303,340,847,894,502,572,032) for an exhaustive combinatorial meta-analysis. We drew a random sample of 100,000 different subsets representative for the full set. To this end, we used a stratified approach to oversample studies with the smallest and largest effects to be able to meaningfully assess outlier influences.
Advantages of multiverse and specification curve analyses over combinatorial meta-analyses consist in their theoretical and conceptual proceeding as well as in their potential variation of the (meta-analytical) technique (e.g. fixed-effect vs. random-effects modelling).
Multiverse and specification curve analyses. In terms of the multiverse and specification curve analyses we distinguished between three internal, or "Which" factors (i.e., which data were meta-analyzed) and two external, or "How" factors (i.e., how were the data metaanalyzed).
The internal (i.e., "Which") factors were: (i) type of religiosity assessment (beliefs, behavior, mixed, or a combination of all), (ii) sample type (college, non-college, pre-college, or a combination of all), and (iii) status of publication (published, unpublished, or a combination of both). Accordingly, these specifications yielded 4 � 4 � 3 = 48 combinations.
In total, data and analysis specifications yielded 12 � 48 = 576 ways to include and analyze the data. We restricted analyses to unique combinations with at least two studies.
Inferential test of the specification-curve meta-analysis. To inferentially test if the (descriptive) meta-analytic specification curve indicated rejecting the null hypothesis of no effect, we used a parametric bootstrap approach. Study features for each primary study were regarded as fixed, but new effect sizes were assigned randomly assuming that the null hypothesis is true. In order to do so, values were drawn from a normal distribution with an expected value equivalent to zero, but a varying standard deviation, which corresponded to the sample's observed standard error (obtained via a fixed-effect model). This specification-curve analysis was repeated 999 times. The resulting 1,000 bootstrapped specification curves were then used to obtain the pointwise lower and upper limits (2.5% and 97.5% respectively) for each specification number, which constitute the boundaries of the null hypothesis. When the limits did not include zero, the evidence was considered to be indicative of a non-nill summary effect.
Mediation analyses. A series of meta-analytical mediation models allowed us to investigate potential mediating effects of education and cognitive styles (i.e., analytic vs. intuitive) on the intelligence and religiosity link. We used random effects models and weighted least square estimations in the two-stage method for indirect effect estimation, following the approach of Cheung [46]. For these analyses, we used the R-package metaSEM [47].

Final sample
Our final sample consisted of 89 studies comprising k = 105 independent samples (88 published vs. 17 unpublished effect sizes; N = 201,457) covering the time span from 1928 to 2020 (study characteristics are detailed in Table 1; the full data set is available at https://osf.io/ka2ym/). Most studies used psychometric intelligence tests to assess cognitive abilities (k = 93) whilst the rest used grade point averages as a proxies (k = 8) or a combination of both (k = 4). Most studies (k = 67) assessed self-reported religious beliefs, whilst the others used self-reported behaviors (k = 11) or a composite of beliefs and behaviors (k = 27). The majority of samples consisted of college students (k = 49), followed by general population (k = 39), pre-college samples (k = 13), and four mixed cases. A checklist of our study outline according to the PRISMA guidelines [48] can be accessed in the S4 Appendix.

Results
Random-effects analyses yielded an overall effect of r = -.14 (p < .001; 95% CI [-.17, -.12]; see first line in Table 2). We provide a rainforest plot of our main analysis in Fig 2 [ without the leverage points yielded virtually identical results. Considering specification curve and multiverse analysis, this constitutes mere two specifications, while there are various alternative specifications.

Combinatorial meta-analysis
In Fig 3 the combinatorial meta-analyses are visualized (GOSH plot). As mentioned earlier, we drew 100,000 random subsets from a larger number of theoretically possible subsets. In general, effect heterogeneity was high, especially when at least one of the two outliers was included (highlighted in red). Clearly, the outlying studies contributed considerably to overall-heterogeneity which was supported by density estimates of the effect distributions. The density estimates of the effect size considerably deviated from zero, suggesting a true non-nill effect.

Multiverse and specification-curve analysis
The descriptive meta-analytic specification-curve plot is provided in Fig 4. Almost all metaanalytic specifications yielded a negative summary effect. Out of 576 possible specifications, 192 comprised more than a single study, yielding 135 (70.4%) nominally significant (p < .05) negative summary effects. None of these specifications led to a significant positive effect. The observed results clearly deviate from the under-the-null scenario of an underlying zero effect (Fig 5), indicating a robust negative association. Fig 6 displays the histogram of the p-value distribution for the summary effect of the various meta-analytic specifications. There is an obvious excess of p-values smaller than .05, further corroborating the above evidence.

Moderator analyses
Descriptive statistics of summary effects of all subgroup analyses are provided in Table 2.
Intelligence assessments. Mixed-effects models were used to investigate if religiosity associations with standardized intelligence test scores differed from those with academic achievement (as assessed by Grade Point Average: GPA). Religiosity showed significantly stronger associations with intelligence test scores than with GPA-based assessments (Q = 11.03, df = 1; p < .001). Associations with mixed assessments yielded correlation Consequently, we report all further analyses for both overall data and the subset of studies that used psychometric intelligence assessments-only to account for potential differential effects of moderating influences of assessment type (because of small case numbers, no separate analyses are provided for GPA-only or mixed assessments).
A meta-regression of effect sizes on the proportion of men (Fig 7) revealed a significant positive influence on sex (b = 0.22; R 2 = .18, p = .002), indicating stronger links of intelligence and religiosity in men than in women. When conducting this meta-regression solely with studies using psychometric intelligence tests, the influence remained significant (b = 0.18; R 2 = .12, p = .015).
Publication type. Differences in effect size estimates of published and unpublished studies conformed to the expected direction, showing smaller (albeit nominally non-significant) effects for unpublished studies (Q = 0.24, df = 1; p = .62). Again, results of studies that only used psychometric intelligence test scores were virtually identical, showing non-significantly larger absolute effects of published than unpublished studies (Q = 0.54, df = 1; p = .46). However, non-significant results of direct comparisons between published and unpublished effect sizes were to be expected, because of the (typically) considerably smaller number of unpublished studies (83.81% published vs. 16.19% unpublished) and the corresponding large confidence interval of their summary effect.
Multiple moderation analysis. We tested the effects of sex, year, and publication status in a multiple moderation analysis. No significant influences of these variables on the intelligence and religiosity link were observed (proportion of men: b = -0.217 [-.847; .414], p = .50, year of .112], p = .45). Moreover, none of these variables had an effect in simple moderation analyses (except in certain cases for sex, which showed seemingly erratic patterns of moderating influences), thus further corroborating the remarkable generalizability of the intelligence and religiosity link across moderators.
Dissemination bias. Only published study effects were included in all dissemination bias analyses (k = 88). Visual inspection of the funnel plot (Fig 8) did not show any obvious signs for asymmetry, excepting one outlier that had been already identified in our influence diagnostics. Formal tests of funnel plot asymmetry were consistent with this assessment. Neither Trim-and-Fill analysis (no indication of missing studies on the right side of the funnel plot), Sterne and Egger's regression test (Z = -1.00, p = .32), nor the rank correlation method (τ = -0.121, p = .10) yielded any evidence for funnel plot asymmetry.
In order to test for excessive significance, the average power based on the observed study effect was calculated (within-study average power = 63%). Consequently, the number of expected significant studies in the hypothesis-conforming direction was 66. There were less studies with a significant outcome observed than what would have been expected based on our power calculations, thus indicating no evidence for bias.
Both p-curve and p-uniform analyses were based on 55 published significant studies. We observed a right-skewed p-curve (Fig 9) with more small (i.e., values that were close to p = .01) than large p-values (i.e., values that were close to p = .05), indicating that the present body of research holds empirical evidence and the extent of p-hacking is negligible. Broadly in line with our findings from our random-effects analysis, p-curve yielded an effect size estimate of r = -. 16.
In our p-uniform analyses the pp-value distribution of the fixed-effect estimate did not differ substantially from the uniform distribution (p = .67), indicating no evidence for dissemination bias. The p-uniform effect estimate was similar to the estimation based on our randomeffects analysis and p-curve r = -.17 (95% CI [-.20; -.15]; p < .001). The slightly larger effect estimates of p-curve and p-uniform compared to the random-effects analysis was to be https://doi.org/10.1371/journal.pone.0262699.g008 expected and is most likely due to the typically observed effect overestimations of these two methods in presence of non-trivial between-studies heterogeneity [52].
The p-uniform � -based estimate was similar to the p-curve and p-uniform estimates, yielding r = -. 16. Consequently, all our used dissemination bias detection and effect estimation methods convergently pointed towards negligible effects of dissemination bias in our present meta-analysis.
Time trends. A weighted meta-regression of effect sizes on study publication years showed neither effect strength declines over time in all (b < 0.001; Q = 0.54, R 2 < .01, p = . 46) or only published effect sizes (i.e., published effects should be more prone to show a decline effect; k = 88; b < 0.001; Q = 0.39, R 2 < .01, p = .63; Fig 10). Interestingly, the signs of the regression coefficients were indicative of increases in effect strength, thus ruling decline effects in the present study out altogether [35].
Primary study quality assessments. A simple regression of effect sizes on study quality showed no significant effect (b = -0.004, R 2 < .01, p = .75). We repeated this analysis in a precision-weighted meta-regression and obtained virtually identical results (b > -0.001, R 2 < .01, p = .93), thus corroborating that study quality did not affect the observed associations.

Education
We conducted mediation analyses to examine potentially mediating effects of education on the intelligence and religiosity link. College samples were excluded from these analyses, because they
Formal tests in meta-analytical mediation models once more supported our findings from the correlation analyses. Both indirect effects were negative as well as significant (b = -0.080;

Discussion
In the present meta-analysis, we provide evidence for robust, albeit small, associations between intelligence and religiosity. Convergent results from a large number of reasonable specifications indicate that these effects generalize and remain robust even when accounting for different moderators, with some allowance to be made for varying effect strengths. The intelligence and religiosity link appears to be therefore virtually ubiquitous, although the presently identified differences in strength present several points of interest, as we discuss below.
First, the intelligence and religiosity link appeared to be stronger when psychometric intelligence was assessed compared to less salient proxies of intelligence such as GPA. This seems reasonable because achievement measures such as GPA are noisy measures of cognitive abilities. School grades are well-known to be driven to a considerable extent by factors such as motivation [53], school environment [54], or socioeconomic status [55], consequently yielding a performance indicator that represents a suboptimal measure of competency. The observation that religiosity correlations with academic achievement measures conform in terms of the sign to correlations with intelligence once more shows the robustness of this link.
Second, the type of religiosity measure that had been used in primary studies appeared to moderate the observed associations, although nominal significance was marginally not reached in our subgroup analysis. All presently included studies used self-reports to assess religiosity, but those that were framed on religious beliefs yielded larger correlations than religious behaviors. This may be attributed to religious behavior (e.g., praying, church attendance, belonging to a denomination or not) being a poorer measure of an individual's religiosity than selfreported beliefs, because behaviors may be motivated by the desire to conform to an in-group instead of religiosity (i.e., religious group-membership and its related behavioral expectations) https://doi.org/10.1371/journal.pone.0262699.g011 [56]. These results reemphasize the importance of distinguishing between intrinsic (i.e., beliefs) and extrinsic motivation (i.e., behavior) when assessing religiosity [57].
Third, the educational status of samples moderated the strength of intelligence and religiosity associations. Correlations of non-college samples were substantially stronger than those of college samples, but differences disappeared when range restriction was accounted for. This means, that previously observed lower associations in better educated samples [6] are most likely an artifact of homogeneous (attenuated) samples.
Although college and non-college samples appear to show virtually identical associations, effects of pre-college samples were substantially smaller, indicating only a trivial intelligence and religiosity association in children and adolescents. A possible explanation for this observation could be that religious worldviews become only fully developed in more mature ages [58]. Although due to the correlational nature of the synthesized studies, causality cannot be inferred, it may speculated that intelligence (or at least one of its covariates) is responsible for the development of certain religious beliefs, rather than the other way around. Longitudinal data are needed to test these ideas.
Fourth, sex had been previously observed to be a potentially influential variable with women showing typically larger effects than men [5], although not all analytic approaches supported this interpretation [6,7]. Our results mirror these inconsistent findings, with continuous assessments indicating larger effects for men, but categorical assessments indicating larger effects for women. Considering prior evidence of virtually nill-effects of within-studies sex differences [6,7], this means most likely that sex does not play a meaningful role for intelligence and religiosity associations altogether.
Fifth, we did not find significant changes in study effect strengths over time, thus contrasting findings of a previous systematic review [5]. There was no evidence for dissemination bias that may have confounded the intelligence and religiosity link. Direct comparisons of published and unpublished studies neither yielded significant nor meaningful differences between subset summary effects. Moreover, application of several standard and modern dissemination bias detection methods as well as novel effect estimation methods showed convergent evidence for the validity and the robustness of the observed summary effect. This means that, although the strength of the intelligence and religiosity association appears to be differentiated over a number of moderators, the observed negative link is not confounded by research processrelated artifacts and generalizes over subgroups. Importantly, primary study quality did not emerge as a meaningful predictor for the intelligence and religiosity link either, thus largely ruling out potential effects of primary study design features.

Education
Based on evidence from longitudinal data, it has been hypothesized, that between-individual differences in education rather than intelligence are most likely the drivers of the intelligence and religiosity link [59,60]. We found that education does not mediate the intelligence and religiosity link, thus contrasting this idea and an earlier meta-analytical account [5]. Interestingly though, we observed a full mediation of intelligence on the education and religiosity association, indicating that intelligence but not education is a more likely driver of intelligence and thus corroborating findings from a previous meta-analysis [7].

Cognitive style
Although the current meta-analytic data do not allow drawing conclusions about causal inferences, it should be acknowledged, that the available evidence indicates intelligence impacting religiosity rather than the other way around. Whilst cognitive ability can be measured quite reliably at an early age and is a good predictor of intelligence in adulthood [55], religious involvement is less stable [61] and early assessed religiosity is a weak predictor of later religiosity [58]. In longitudinal studies [62][63][64][65], intelligence was measured several years before religiosity. In a previous review [6], summarizing these results led to a negative correlation of intelligence and religiosity when considering only belief-based measures, supporting a model in which intelligence drives religiosity.
On the whole, several plausible causes for a negative association of intelligence and religiosity have been suggested [6]. In the present meta-analysis, the role of cognitive style in the context of single and dual-process models of the mind is of particular interest, because we were able to empirically examine effects of analytic and intuitive styles on the intelligence and religiosity link.
According to the dual-process model of the mind [66,67] there are two different systems of information processing. One outputs intuitive judgments, because it is responsible for automatic information processing. The other one is responsible for systematic, analytic information processing. It has been shown that individuals differ in their tendencies to rely on one of these processing styles [67]. From this perspective, it could be argued that individuals that use a more intuitive (and therefore experiential) and a less analytic (and therefore rational) information processing style may be more accessible to religious ideals. This interpretation is supported by the observations that individuals that reported stronger beliefs in God showed lower analytical processing styles [68], whilst more intelligent people were found to prefer analytical over intuitive processing [69]. Our results in terms of mediating influences of cognitive style on the religiosity and intelligence association are consistent with this idea.
However, recent studies promote single-process models where analytic and intuitive cognitive styles represent the extremes on a continuum, thus adopting a dimensional rather than a categorical view of cognitive styles (for an overview, see [67]). Arguably, the observed partial cognitive style mediation of the intelligence and religiosity association may be seen to fit better to a single process rather than a dual-process model, because the cognitive style variables were treated as a dimensional rather than a categorical concept in primary studies and our metaanalysis.

Limitations
Some limitations of the present meta-analysis need to be acknowledged. It would have been desirable to investigate if belongingness to different religious denominations influences the intelligence and religiosity association. For instance, Catholicism or Judaism are characterized by a strong embeddedness of believers in their community. Common religious practices and rituals including other people are an inherent part of religious identity [70]. American Protestantism on the other hand is more individualistic. These differences give reason to assume that the stronger negative association of intelligence and religious beliefs (in contrast to religious behavior) may be most pronounced for American Protestants [6], but perhaps less so for other denominations. Unfortunately, due to insufficient reporting in primary studies, potential moderating effects of religious affiliations could not be assessed. In a similar vein, the predominant part of included studies was conducted in Western countries, especially in the United States, therefore limiting the generalizability of our findings to Western contexts. Moreover, it was not possible to assess potential effects of conceptual differences between spirituality and religiosity in the present study. For example, as individuals age, religiosity seems to increase stronger in Catholics than in Protestants [66] which may be related with a greater social embeddedness and therefore extrinsic motives to turn to religion. Such extrinsic reasons for religious behaviors are an important differentiating factor of religion and spirituality. It needs to be acknowledged that these terms represent conceptually non-identical constructs that, however, typically largely empirically overlap [71].
Because of sometimes uncertain dimensionality of assessment instruments and certain suboptimal methodical features of primary studies (e.g. single item assessments asking about beliefs in supernatural agents [72]), correlations of IQ with religiosity can be expected to be confounded by certain facets of spirituality in the present study. Although it cannot be ruled out that this ambiguity may have introduced some statistical noise into the present data, the considerable empirical overlap of these two constructs should largely alleviate concerns about the robustness of our results. In fact, it speaks even more for the robustness of our findings, that even in presence of these potential confounders, our results convergently indicated meaningful negative associations between intelligence and religiosity.

Conclusion
The results of the present meta-analysis demonstrate a clear-albeit small-and almost ubiquitous negative association of religiosity with intelligence. This negative link was most pronounced when results of psychometric intelligence tests were correlated with self-reported beliefs and seemed to be strongest in members of the adult general population. In all, the convergent evidence from more standard and several modern approaches to meta-analyses, such as combinatorial, multiverse, and specification-curve analyses, as well as results from a large number of bias assessment methods indicated a remarkable robustness of this effect that generalizes in direction and meaningfulness (although not strength) across moderators.